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Abstract. Given a set S of n points in the plane, we consider the 
problem of answering range selection queries on 5": that is, given an 
arbitrary x-range Q and an integer k > 0, return the A:-th smallest 
y-coordinate from the set of points that have ^-coordinates in Q. We 
present a linear space data structure that maintains a dynamic set of n 
points in the plane with real coordinates, and supports range selection 
queries in 0((lgn/ lg lg n) 2 ) time, as well as insertions and deletions in 
0((lg n/ lglg n) 2 ) amortized time. The space usage of this data structure 
is an 0(lgn/lglgn) factor improvement over the previous best result, 
while maintaining asymptotically matching query and update times. We 
also present the first succinct data structure that supports range selection 
queries on a dynamic array of n values drawn from a bounded universe. 



1 Introduction 

The problem of finding the median value in a data set is a staple problem in 
computer science, and is given a thorough treatment in modern textbooks [5J. In 
this paper we study a dynamic data structure variant of this problem in which 
we are given a set S of n points in the plane. The dynamic range median problem 
is to construct a data structure to represent S such that we can support range 
median queries: that is, given an arbitrary range Q = [xi, X2], return the median 
y-coordinate from the set of points that have ^-coordinates in Q. Furthermore, 
the data structure must support insertions of points into, as well as deletions 
from, the set S. We may also generalize our data structure to support range 
selection queries: that is, given an arbitrary rr-range Q = [x\, X2] and an integer 
k > 0, return the fc-th smallest y-coordinate from the set of points that have 
x-coordinates in Q. 

In addition to being a challenging theoretical problem, the range median 
and selection problems have several practical applications in the areas of image 
processing Internet advertising, network traffic analysis, and measuring real- 
estate prices in a region [12 . 

In previous work, the data structures designed for the above problems that 
support queries and updates in polylogarithmic time require superlinear space [5] . 
In this paper, we focus on designing linear space dynamic range selection data 
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structures, without sacrificing query or update time. We also consider the prob- 
lem of designing succinct data structures that support range selection queries 
on a dynamic array of values, drawn from a bounded universe: here "succinct" 
means that the space occupied by our data structure is close to the information- 
theoretic lower bound of representing the array of values |15) . 

1.1 Previous Work 

We now summarize the previous results on both the static and dynamic versions 
of the range median and selection problems. 

Static Case: The static range median and selection problems have been studied 
heavily in recent years |3I17I12I21I22I9I10I4I5I16| . In these problems we consider 
the n points to be in an array: that is, the points have x-coordinates {1, n}. 
We now summarize the upper and lower bounds for the static problem. 

For exact range medians in constant time, there have been several iterations 
of near-quadratic space data structures |17l21l22j . For linear space data struc- 
tures, Gfeller and Sanders [TU] showed that range median queries could be sup- 
ported in O(lgn) timcQ and Gagie et al. [9] showed that selection queries could 
be supported in O(lgcr) time using a wavelet tree, where a is the number of dis- 
tinct y-coordinates in the set of points. Optimal upper bounds of 0(lgn/lglgn) 
time for range median queries have since been achieved by Brodal et al. [4 5], and 
lower bounds by J0rgensen and Larsen [16] : the latter proved a cell-probe lower 
bound of J?(lgn/lglgn) time for any static range selection data structure using 
0(n\g oil) n) bits of space. In the case of range selection when k is fixed for all 
queries, J0rgensen and Larsen proved a cell-probe lower bound of J?(lg kj lg lg n) 
time for any data structure using 0(n lg° (1) n) space [TS]. Furthermore, they pre- 
sented an adaptive data structure for range selection, where k is given at query 
time, that matches their lower bound, except when k — 2°( lg lgn ) [IB] , 

Finally, Bose et al. [3] studied the problem of finding approximate range 
medians. A c-approximate median of range [i-.j] is a value of rank between 
1 x \i=£±] and (2 - i) x \i=£±] , for c> 1. 

Dynamic Case: Gfeller and Sanders |10j presented an 0(n\gn) space data struc- 
ture for the range median problem that supports queries in 0(lg 2 ri) time and 
insertions and deletions in 0(lg 2 n) amortized time. 

Later, Brodal et al. |4|5) presented an 0(n lgn/lglgn) space data struc- 
ture for the dynamic range selection problem that answers range queries in 
0((lgn/lglgn) 2 ) time and insertion and deletions in 0((lg n/ lg lg n) 2 ) amor- 
tized time. They also show a reduction from the marked ancestor problem [T] 
to the dynamic range median problem. This reduction shows that !?(lg n/ lg lg n) 
query time is required for any data structure with polylogarithmic update time [8] . 
Thus, there is still a gap of <9(lgn/lglgn) time between the upper and lower 
bounds for linear and near linear space data structures. 



1 In this paper we use lgn to denote log 2 n. 



1.2 Our Results 



In the remainder of this paper we assume the word-RAM model of computa- 
tion with word size w = Q(lgn) bits. In Section [2j we present a linear space 
data structure for the dynamic range selection problem that answers queries in 
O ( (lg n/ lg lg n) 2 ) time, and performs insertions and deletions in O ( (lg nj lg lg n) 2 ) 
amortized time. This data structure can be used to represent point sets in which 
the points have "real" coordinates. In other words, we only assume that the 
coordinates of the points can be compared in constant time. This improves the 
space usage of the previous best data structure by a factor of 6>(lgn/ lglgn) [5], 
while maintaining the query and update time. 

In Section |3j we present the first succinct data structure that supports range 
selection queries on a dynamic array A of values drawn from a bounded uni- 
verse [l..u]. The data structure occupies nH${A) + o(nlger) + 0(w) bitfJ^J and 
supports queries in O ( ig^T K ( igf^ w + 1)) time, and insertions and deletions in 
0( i g ^g'„ ( il\g n + 1)) amortized time. In the worst case, this space cost is little 
more than the n lg a bits required to encode the array of values. 

2 Linear Space Data Structure 

In this section we describe a linear space data structure for the dynamic range 
selection problem. Our data structure follows the same general approach as the 
dynamic data structure of Brodal ct al. [5J. However, we make several important 
changes, and use several other auxiliary data structures, in order to improve the 
space by a factor of <9(lgn/ 'lglgn). 

The main data structure is a weight balanced B-tree 0, T, with branching 
parameter 6>(lg £l n), for < E\ < 1/2, and leaf parameter 1. The tree T stores 
the points in S at its leaves, sorted in non-decreasing order of y-coordinat^j 
The height of T is hi = 0(\gn/ lglgn) levels, and we assign numbers to the 
levels starting with level 1 which contains the root node, down to level hi which 
contains the leaves of T. Inside each internal node v € T, we store the smallest 
and largest y-coordinates in T(v). Using these values we can acquire the path 
from the root of T to the leaf representing an arbitrary point contained in S 
in O(lgn) time; a binary search over the values stored in the children of an 
arbitrary internal node requires O(lglgn) time per level. 

Following Brodal et al. [5], we store a ranking tree R{v) inside each internal 
node v G T. The purpose of the ranking tree R(v) is to allow us to efficiently 
make a branching decision in the main tree T, at node v. Let T(v) denote the 
subtree rooted at node v. The ranking tree R(v) represents all of the points 
stored in the leaves of T(v), sorted in non-decreasing order of x-coordinate. 
The fundamental difference between our ranking trees, and those of Brodal et 
al. [5], is that ours are more space efficient. Specifically, in order to achieve linear 

2 Ho(A) denotes the Oth-order empirical entropy of the set of values stored in A. 

3 Throughout this paper, whenever we order a list based on y-coordinate, it is assumed 
that we break ties using the i-coordinate, and vice versa. 



space, we must ensure that the ranking trees stored in each level of T occupy 
no more than O(nlglgn) bits in total, since there are 0(lgn/lglgn) levels in 
T. We describe the ranking trees in detail in Section [2TT| but first discuss some 
auxiliary data structures we require in addition to T. 

We construct a red-black tree S x that stores the points in S at its leaves, 
sorted in non-decreasing order of ^-coordinate. As in [5], we augment the red- 
black tree S x to store, in each node v, the count of how many points are stored in 
T(vi) and T(w 2 ), where v\ and i>2 are the two children of v. Using these counts, 
S x can be used to map any query [xi,X2] into ri, the rank of the successor of 
x\ in S, and r2, the rank of the predecessor of X2 in S. These ranking queries, 
as well as insertions and deletions into S x , take O(lgn) time. 



S £ = 2 




Y 2 = 4321431223142341 

Fig. 1. The top two levels of an example tree T, and the corresponding strings Y\ and 
Yi for these levels. Each node at level 2 has exactly 4 children. 

We also store a string Y(v) for each node v in T. This string consists of all of 
the of the points in T(v) sorted in non-decreasing order of x-coordinate, where 
each point is represented by the index of the child of node w's subtree in which 
they are contained, i.e., an integer bounded by 0(lg ei n). However, for technical 
reasons, instead of storing each string with each node v G T, we concatenate all 
the strings Y(v) for each node v at level I in T into a string of length n, denoted 
by Yg. Each chunk of string Yi from left to right represents some node v in level 
£ of T from left to right within the level. See Figure [l] for an illustration. We rep- 
resent each string Yg using the dynamic succinct data structure for representing 
strings of He and Munro [M]. Depending on the context, we refer to both the 
string, and also the data structure that represents the string, as Y e . Consider 
the following operations on the string Yf. 

— access(Yf , i), which returns the i-th integer, Yg[i], in Y^; 

— rank Q (Y£, i), which returns the number of occurrences of integer a in Y^[l..i]; 

— range_count(l£, x\, X2, j/i, 1/2)5 which returns the total number of entries in 
Yg[xi..X2\ whose values are in the range [7/1 . .7/2] ; 

— insert Q (Y£, i), which inserts integer a between Yg[i — 1] and Yi[i\; 

— delete(Yf, i), which deletes Yg[i] from Yg. 



Let W = [" igfignl 1 • ^ ne following lemma summarized the functionality of 
these data structures for succinct dynamic strings over small universe: 

Lemma 1 Q14J). Under the word RAM model with word size w — S7(\gn), a 
string Y^[l..n\ of values from a bounded universe where a — 0(lg M n) for 

any constant jj, € (0, 1), can be represented using nH (Y e ) + O(^ff^) + O(w) 

bits to support access, rank, range_covmt, insert and delete in O ( ) 
time. Furthermore, we can perform a batch of m update operations in 0(m) 
time on a substring Yg[i..i + m — 1] in which the j-th update operation changes 
the value of Yi [i + j — 1], provided that m > . 

The data structure summarized by the previous lemma is, roughly, a B-tree 
constructed over the string yj»[l„n], in which each leaf stores a superblock, which 
is a substring of Yp [l..n] of length at most 2W bits. We mention this because the 
ranking tree stored in each node of T will implicitly reference to these superblocks 
instead of storing leaves. Thus, the leaves of the dynamic string at level £ are 
shared with the ranking trees stored in nodes at level £. 

As for their functionality, these dynamic string data structures Yt are used to 
translate the ranks r\ and r2 into ranks within a restricted subset of the points 
when we navigate a path from the root of T to a leaf. The space occupied by 
these strings is 0((n lg(lg 61 n) + w) x lgn/lglgn) bits, which is 0(n) words. We 
present the following lemma: 

Lemma 2. Ignoring the ranking trees stored in each node of T , the data struc- 
tures described in this section occupy 0{n) words. 

In the next section we discuss the technical details of our space-efficient 
ranking tree. The key idea to avoid using linear space per ranking tree is to 
not actually store the points in the leaves of the ranking tree, sorted in non- 
decreasing order of ^-coordinate. Instead, for each point p in ranking tree R{v), 
we implicitly reference the the string Y(v), which stores the index of the child 
of v that contains p. 

2.1 Space Efficient Ranking Trees 

Each ranking tree R{v) is a weight balanced B-tree with branching param- 
eter lg £2 n, where < Ei < 1 — £i, and leaf parameter Q{W/ \g\\gn\ ) — 
(9((lgn/lglgn) 2 ). Thus, R(v) has height <9(lgn/ lglgn), and each leaf implicitly 
represents a substring of Y(v), which is actually stored in one of the dynamic 
strings, Yi. 

Internal Nodes: Inside each internal node u in R(v), let qi denote the number 
of points stored in the subtree rooted at the i-th child of u, for 1 < i < f%, 
where fi is the degree of u. We store a searchable partial sums structure [53] 
for the sequence Q = {q±, 5/ 2 }- This data structure will allow us to efficiently 
navigate from the root of R(v) to the leaf containing the point of x-coordinate 
rank r. The functionality of this data structure is summarized in the following 
lemma: 



Lemma 3 (|23j). Suppose the word size is Q(\gn). A sequence Q of O(lg^n) 
nonnegative integers of 0(\gn) bits each, for any constant [i £ (0,1), can be 
represented in 0(lg 1+/i n) bits and support the following operations in 0(1) time: 

— sim(Q,i) which returns J2] =1 Q[j], 

— search(Q, a;) which returns the smallest i such that sum(Q, i) > x, 

— modif y(Q, i, S) which sets Q[i] to Q[i] + 5, where \5\ < lgn. 

This data structure can be constructed in 0(lg^ n) time, and it requires a pre- 
computed universal table of size 0(n^ ) bits for any fixed // > 0. 

We also store the matrix structure ol Brodal et al. [5] in each internal each 
node u of the ranking tree. Let f\ — 6>(lg £l n) denote the out-degree of node 
v € T, and let T(vi), ...jT^UfJ denote the subtrees rooted at the children of v 
from left to right. Similarly, recall that f% = <9(lg £2 n) denotes the out-degree 
of u E R(v), and let T'(ui), T'(uf 2 ) be the subtrees rooted at each child of 
u from left to right. These matrix structures are a kind of partial sums data 
structure defined as follows; we use roughly the same notation as [5]: 

Definition 1 (Summarizes |5j). A matrix structure M u is an ft x f 2 matrix, 
where entry M™ q stores the number of points from U? =1 T'(ii<) that are contained 
in Uf =1 T(uj). The matrix structure M u is stored in two ways. The first repre- 
sentation is a standard table, where each entry is stored in O(lgn) bits. In the 
second representation, we divide each column into sections of 6>(lg ei n) bits - 
leaving (9(lglgn) bits of overlap between the sections for technical reasons - 
and we number the sections s%, s g , where g — Oflg 1 " 61 n). In the second rep- 
resentation, for each column q, there is a packed word w qi , storing section Si 
of each entry in column q. Again, for technical reasons, the most significant bit 
of each section stored in the packed word w q i is padded with a zero bit. 

We defer the description of how the matrix structures are used to guide 



queries until Section 2.2 For now, we just treat these structures as a black box 



and summarize their properties with the following lemma: 

Lemma 4 (|5j). The matrix structure M u for node u in the ranking tree R(v) 
occupies 0(lg 1+£l+E2 n) bits, and can be constructed in o(lg 1+ei+62 n) time. Fur- 
thermore, consider an update path that goes through node u when we insert a 
value into or delete a value from R(v). The matrix structures in each node along 
an update path can be updated in O(l) amortized time per node. 

Shared Leaves: Now that we have described the internal nodes of the ranking 
tree, we describe the how the leaves are shared between R(v) and the dynamic 
string over Yg. To be absolutely explicit, we do not actually store the leaves of 
R(v): they are only conceptual. We present the following lemma, the proof can 
be found in Appendix \K\ 



Lemma 5. Let u be a leaf in R(v) and S be the substring of Y(v) that u rep- 
resents, where each value in S is in the range [1..ct], and a — 6*(lg £l n). Using 



a universal table of size 0(y/n x polylog(n)) bits, for any z € [L^SI], an array 
Cz = {ci, c CT } can be computed in 0(\gn/ \g\gn) time, where Cj = rank^S*, z), 
/or 1 < i < a. 

We now present the following lemma regarding the space and construction 
time of the ranking trees; see Appendix [B] for the proof: 

Lemma 6. Each ranking tree R(v) occupies O ^^pr^ff^ — hffij bits of space if 

\T(v)\ — m, and requires 0(m) time to construct, assuming that we have access 
to the string Y(v). 

Remark 1. Note that the discussion in this section implies that we need not 
store ranking trees for nodes v £ T, where = 0(lgn/lglgn) 2 . Instead, we 

can directly query the dynamic string Yi using Lemma [5] in 0(lgn/lglgn) time 
to make a branching decision in T. This will be important in Section [3j since it 
significantly reduces the number of pointers we need. 

2.2 Answering Queries 

In this section, we explain how to use our space efficient ranking tree in order 
to guide a range selection query in T. 

We are given a query [x\, X2] as well as a rank fc, and our goal is to return the 
fc-th smallest y-coordinate in the query range. We begin our search at the root 
node v of the tree T. In order to guide the search to the correct child of v, we 
determine the canonical set of nodes in R(v) that represent the query [xi,X2]. 
Before we query R(v), we search for X\ and x 2 in S x . Let r\ and r 2 denote the 
ranks of the successor of X\ and predecessor of X2 in S, respectively. We query 
R(v) using [ri, r 2 ], and use the searchable partial sum data structures stored in 
each node of R(v), to identify the canonical set of nodes in R(v) that represent 
the interval [r*i, r 2 ]. At this point we outline how to use the matrix structures in 
order to decide how to branch in T. 

Matrix Structures: We discuss a straightforward, slow method of computing the 
branch of the child of v to follow. The full details of the faster method can be 
found in Appendix [C] as well as the original paper [S]. 

In order to determine the child of v that contains the fc-th smallest y- 
coordinate in the query range, recall that T is sorted by y-coordinate. Let /1 
denote the degree of v, and q\ denote the number of points that are contained in 
the range \x\, X2] in the subtree rooted at the i-th child of v, for 1 < i < d. Deter- 
mining the child that contains the fc-th smallest y-coordinate in [xi, x 2 ] is equiv- 
alent to computing the value t such that Qi < & an d SI=i Qi ^ I R or der 
to compute r, we use the matrix structures in each internal node of the canon- 
ical set of nodes, C, that represent [xi,ir 2 ]. The set C contains 0(lgn/lglgn) 
internal nodes, as well as at most two leaf nodes. 

Consider any internal node ti£C, and without loss of generality, suppose u 
was on the search path for r±, but not the search path for r 2 , and that u has 



degree fi. If the search path for r\ goes through child c q in u, then consider the 
difference between columns f2 and q in the first representation of matrix M u . We 
denote this difference as M' u , where M[ u = M^ h -M^ q , for 1 < i < f t . For each 
internal node u € C we add each M' u to a running total, and denote the overall 
sum as M' . Next, for each of the — at most — two leaves on the search path, 
we query the superblocks of Yg to get the relevant portions of the sums, and add 
them to M'. At this point, M[ = q[, and it is a simple matter to scan each entry 
in M' to determine the value of r. Since each matrix structure has fi entries in 
its columns, this overall process takes 0(/i x lgn/lglgn) = 0(lg 1+£l n/lglgn) 
time, since there are 0(lgn/ lglgn) levels in R(v). Since there are 0(lgn/ lglgn) 
levels in T, this costs 0((lg n/lglgn) 2 x lg £l n) time. This time bound can be 
reduced by a factor of /i = 0(lg £l n), using word-level parallelism and the second 
representation of the matrix structures [5]. 

Recursively Searching in T: Let v T denote the r-th child of v. The final detail to 
discuss is how we translate the ranks [ri,r2] into ranks in the tree R(v T ). To do 
this, we query the string Y{v) before recursing to v T . We use two cases to describe 
how to find Y(v) within Yg. In the first case, if v is the root of T, then Yg = Y(v). 
Otherwise, suppose the range in Yg_i that stores the parent v p of node v begins 
at position z, and v is the i-th child of v p . Let Cj = range_count(y(i>), z, z + 
l,j) for 1 < j < fi. Then, the range in Yg that stores Y(v) is [x+c,_i, z + 
Cj]. We then query Y(v), and set ri = rank T (K(w), ri), ri = rank T (y(u), r2), 
k — k — gf T _i, and recurse to v T . We present the following lemma, summarizing 
the arguments presented thus far: 

Lemma 7. The data structures described in this section allow us to answer 
range selection queries in 0((\gn/ lglgn) 2 ) time. 

2.3 Handling Updates 

In this section, we describe the algorithm for updating the data structures. We 
start by describing how insertions are performed. First, we insert p into S x and 
look up the rank, r x , of p's x-coordinate in S x . Next, we use the values stored in 
each internal node in T to find p's predecessor by y-coordinate, p'. We update 
the path from p' to the root of T . If a node v on this path splits, we must rebuild 
the ranking tree in the parent node v p at level £, as well as the dynamic string 
Y t . 

Next, we update T in a top-down manner; starting from the root of T and 
following the path to the leaf storing p. Suppose that at some arbitrary node 
v in this path, the path branches to the j-th child of v, which we denote Vj. 
We insert the symbol j into its appropriate position in Yg. After updating Yg 
— its leaves in particular — we insert the symbol j into the ranking tree i?(i>), 
at position r x , where r x is the rank of the cc-coordinate of p among the points 
in T(v). As in T, each time a node splits in R(v), we must rebuild the data 
structures in the parent node. We then update the nodes along the update path 
in R(v) in a top-down manner: each update in R(v) must be processed by all of 



the auxiliary data structures in each node along the update path. Thus, in each 
internal node, we must update the searchable partial sums data structures, as 
well as the matrix structures. 

After updating the structures at level £, we use Yg to map r x to its appropriate 
rank by x-coordinate in T(vj). At this point, we can recurse to Vj. In the case of 
deletions, we follow the convention of Brodal et al. [5] and use node marking, and 
rebuild the entire data structure after 0(n) updates. We present the following 
theorem; the numerous technical details can be found in Appendix |D| 

Theorem 1. Given a set S of points in the plane, there is a linear space dy- 
namic data structure representing S that supports range selection queries for any 
range [x\,X2] in 0((lgn/ lglgn) 2 ) time, and supports insertions and deletions 
in 0((lgn/ lglgn) 2 ) amortized time. 

3 Dynamic Arrays 

In this section, we show how to adapt Theorem [T] for problem of maintaining 
a dynamic array A of values drawn from a bounded universe [l..cr]. A query 
consists of a range in the array, [i.-j], along with an integer k > 0, and the 
output is the fc-th smallest value in the subarray A[i..j]. Inserting a value into 
position i shifts the position of the values in positions A[i..n] to A[i + l..n + 1], 
and deletions are analogous. We present the following theorem, the proof can be 
found in Appendix |E} 

Theorem 2. Given an array A[l..n] of values drawn from a bounded universe 
[l-.cr], there is an nHo(A) + o(nlger) + 0(w) bit data structure that can support 
range selection queries on 

A in °(lg!g^(ifig^ + !)) time > and 

insertions into, 

and deletions from, A in Of /f" fr-f^ — h 1)) amortized time. Thus, when a = 

J ' V lg lg n >■ lg lg n > ' 7 

0(polylog(n)) this is 0( l ^ g i ^ n ) time for queries, and 0( ^^ n ) amortized time 
for insertions and deletions. 

4 Concluding Remarks 

In the same manner as Brodal et al. [5], the data structure we presented can 
also support orthogonal range counting queries in the same time bound as range 
selection queries. We note that the cell-probe lower bounds for the static range 
median and static orthogonal range counting match |20I16| . and — very re- 
cently — dynamic weighted orthogonal range counting was shown to have a 
cell-probe lower bound of J?((lgn/ lglgn) 2 ) query time for any data structure 
with polylogarithmic update time [IS]. In light of these bounds, it is likely that 
0((lgn/ lglgn) 2 ) time for range median queries is optimal for linear space data 
structures with polylogarithmic update time. However, it may be possible to do 
better in the case of dynamic range selection, when k — o(n £ ), for any e > 0, 
using an adaptive data structure as in the static case [16] . 
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A Proof of Lemma [5] 



Instead of explicitly storing the leaves of R(v), we use the partial sums structures 
along the path from the root of R(v) to the parent of leaf u to produce two ranks, 
r[ and r' 2 , which are the starting and ending ranks of the substring S represented 
by u in Y e . Based on the leaf parameter of R(v), and the properties of weight 
balanced B-trees [2], S can have length ©(W/lglgn). Following the analysis of 
how superblocks are laid out in the dynamic string Ye., this means that S is 
stored in a constant number of consecutive superblocks [EH Section 4]. 

Given that we know r[ and r' 2 , we can acquire a pointer to the first superblock 
that stores part of S in 0(lg nj lg lg n) time [HI Lemma 7] . Inside each superblock 
the substring is further decomposed into a list of blocks of length 0((lgn) 3 / 2 ) 
bits each, in which only the final block has free space. In order to produce the 
array C z , we scan the list of blocks in the superblock up to position z, reading 
(lgra)/2 bits at a time. Since each value in occupies eilglgn bits, we can 

read 0(\gn/ lg lg n) values at a time. As we read these values, we keep a running 
total of the ranks of each value in up to our current position. Let field b t 

denote rank; (S, p), where p is our current position within S. Clearly, bi occupies 
O(lglgn) bits. Furthermore, let b = b\...b a be the concatenation of these fields. 
Thus, the running total, b, contains at most 21g £l xeilglgn bits, and can be 
stored in a constant number of words. 

In order to efficiently update our running total 6 after reading a (lgn)/2 
bits from the current block, we perform a lookup in a universal table A. Let a 
denote the et-th lexicographically smallest string of length (lgn)/2 bits, over the 
bounded universe [I, a]. Also, let fi(a) denote the frequency of symbol % £ [1, a] 
in string a. In each entry of the table A [a], we store the value b' = fi(a)...f a (a). 
Since both b and b' fit in a constant number of words, we can exploit word-level 
parallelism to update the running total by summing all the fields in b and b' in 
0(1) time. 

The table A occupies 0(2 (lgn) / 2 x lg £l n x lglgra) = 0(y/n x polylog(n)) 
bits, and allows us to process 0(lgn/ lglgn) values in O(l) time. Recall that 
the entire superblock contains &((lgn/ lglgn) 2 ) values. Thus, we can return C z 
in 0((lgn/ lglgn) 2 x (lglgn/lgra)) time. 

B Proof of Lemma [6] 

The space occupied by the internal nodes is 0(m(lg lg n) 2 / lg 1 " 51 n) bits, since 
each internal node occupies 0(lg 1+£l+E2 n) bits by Lemmas § and [fj and the 
number of internal nodes is 0(m/ lg E2 n x (lglgn/lgn) 2 ). In order to reduce 
the cost of the pointers between the internal nodes in R(v) to 0(lg n) bits per 
pointer, we make use of well known memory blocking techniques for dynamic 
data structures (e.g., see [131 Appendix J]). The main idea is to allocate a fixed 
memory area for the entire ranking tree, and perform all updates to the ranking 
tree using memory from this area. After 0(m) updates, we allocate a new area 
and copy over the entire ranking tree. The cost of using this memory block- 
ing will amount to O(l) amortized time per update. Thus the overall space is 



0(m(lglgn) 2 /lg 1_El n + w) bits, since we must count the pointer to the root of 
the ranking tree, stored at the start of the fixed memory area. 

Now we analyze the overall construction time. Based on Lemmas [3] and |4j we 
can construct all the internal nodes in 0(m) time, since the number of internal 
nodes is 0(m/ lg £2 x(lglgn/lgn) 2 ), and each requires o(lg 1+£l+£2 n)+0(lg £2 n) 
time to construct. 

C Full Details of Fast Query Algorithm 

The main idea of the fast method is to use word-level parallelism and the second 
representation of each matrix in order to speed up the query time. When we 
begin our search in the root node v of the tree T, consider the sections of k, 
denoted k\,...,k g . We query R(v), using the first section k\ to guide the search. 
In order to remove the /i factor from the slow method, in each internal node 
u we subtract the packed words w" f 2 — w" q , then add them to a running total 
w\. After we have summed the differences between the packed words in all the 
internal nodes, we add the relevant sections from the canonical leaf nodes using 
Lemma [5] 

Since w\ is only a rough approximation of the first section of M' ', each value 
in the packed word might be off by 0(lg nj lg lg n): the number of additions and 
subtractions we used to compute w±. This means there may be errors, caused by 
carry bits, in possibly the O(lglgn) least significant bits in each value stored in 
the packed word wi. We scan each entry in w\ to determine the indices of the 
first and last entries in w\ that match k\, except for the last O(lglgn) bitsQ 
as well as the first value that is greater than ki in a more significant bit beyond 
the O(lglgrt) least significant bits. Let K — {ei, e^} denote this set of entries 
in the packed word w\. 

Next, we check the largest and smallest entries, e\ and e&, in K in order to 
determine if one of these is r. This can be done in 0(lg nj lg lg n) time. If neither 
ei or e;, is t, then there arc several cases for how to proceed. If there are only a 
constant number of values in K, then we can compute the index, r, of the child 
of v that we should branch to, by computing the entries in M' for these values 
in 0(lg7i/lglgn) time each. We call this the good case. However, if there are a 
non-constant number of entries in K, then we are in the bad case, and we must 
do a binary search over M' to determine r. This costs 0(lglgn x lgn/lglgn) = 
O(lgn) time in total. 

The key observation, is that after we do the binary search for r in the bad 
case, at no 'point in the future will we ever have to examine the first section 
of k or the matrix structures. This is because when we are in the bad case, 
the difference between the first sections of q' T+1 and q' T is a value that can be 
stored in O(lglgn) bits: which is why the overlap between sections was set to 
be O(lglgn). Moreover, since there are only g = 0(lg 1_£l n) sections, we can 

4 This can be done in constant time using parallel subtraction as in [5], but that is 
only necessary in the static case, where we are not allowed to spend an additive 
0(lg E1 ) factor at each level in T. 



spend at most 0((lgn)/g) = o(lgn/ lglgn) time in the bad case before we have 
exhausted all of the bits in the matrix structures; once there are no more bits, we 
are guaranteed to have found the fc-th smallest y-coordinate in the query range. 
Since the good case requires 0(lgn/ lglgn) time, and there are 0(lgn/ lglgn) 
levels in T, our search costs at most 0((lgn/ lglgn) 2 ) time. 

D Proof of Theorem [T] 

The query time follows from Lemma [7] and the space from Lemmas [2] and [6| All 
that remains is to analyze the update time. 

We can insert a point p into S x in O(lgn) time. The node structure of T can 
be updated in 0(lg £l n) amortized time by the properties of weight balanced 
B-trccs. Similarly, the node structure of R(v) for each node v in the update path 
in T can be updated in 0(lg £2 n) amortized time, and there are 0(lgn/ lglgn) 
ranking trees that are updated. Thus, the tree structure of T and the ranking 
trees in each node can be updated in o((lgn/ lglgn) 2 ) time. 

The difficulty arises when a node v in T splits, since the index of v relative 
to the other children of v p has changed. In this case, we are required to not only 
rebuild R(v p ), but also the substring of Yg that stores Y{v p ). If T(v) contains m 
points, then Y(v p ) is a string of length 0(m x lg £l n), and constructing R(v p ) 
takes 0(lg £l n x m) time after 0{m) updates, by Lemma [6] This is C*(lg £l n x 
lgn/ lglgn) amortized time in total, since splits can occur in each level of T. 

One issue is that this analysis assumes that we have access to the updated 
version of Y(v p ), storing the indices of the children of v p , sorted by x-coordinatc, 
after the split. We now explain the technical details of how to compute this string; 
a task that requires a few more definitions. Let V\ and v 2 denote the two nodes 
into which v splits, and let ci,...,c/, where / G 0(\g £l n) is the degree of v, 
denote the left-to-right sequence of children of v before the split. Suppose that 
after the split, children c±, Cd become the children of v± and Cd+i, c/ become 
the children of where 1 < d < f, and d = 6>(lg £l n). Also suppose that v is 
the i-th child of v p , and denote the degree of v p as f p . 

First, we extract the strings Y(v p ) and Y(v) from Yi and Yt+\: this requires 
0(m lg £l n + lg nj lg lg n) time in total, since we must traverse a root-to-leaf path 
in the dynamic strings. The next step is to scan both strings Y(v p ) and Y(v) 
together, and at the same time write an updated string Y'(v p ), which will be the 
sequence of indices of v p 's children, after v is split. When we encounter the index c 
in Y(v p ), we append c to Y'(v p ) if c € {1, i— 1}, and c+ 1 if c € {i + l, f p }. 
In the case when c — i, then we check the corresponding index d in Y(y): if 
d € {1, d}, then we append c to Y'(v p ), and a c + 1 otherwise. For example, 
let Y{v) = {1, 2, 3, 4, 1}, Y(v p ) = {1, 3, 3, 2, 4, 1, 2, 3, 3, 4, 3, 1}, and v be the 3-rd 
child of v p . Suppose that v is to be split so that children 1 and 2 become the 
children of v\ and 3 and 4 become the children of Vi . Then, following the steps we 
just described, Y'{v p ) = {1, 3, 3, 2, 5, 1, 2, 4, 4, 5, 3, 1}. Overall, generating Y'(v p ) 
takes 0(m lg El n + lgn/ lglgn) time, since we do one scan through both Y(v p ) 
and Y(v). The additive 0(lgn/ lglgn) term is absorbed in all but a constant 



number of levels near the bottom of T, where m — o(lg 1_£l n/lglgn). Thus, 
the string generation algorithm described above requires 0(lg El n x lg n/lglgn) 
amortized time, when we consider that each level in T can split during an inser- 
tion. 

When a split occurs in T, we must also do a batched update on the dynamic 
string Ye, where I is the level of v p . To do this, we make use of the batched 
insertion operation from Lemma [I] When |T(w p )| > 5W / /0(lglgn), where W is 
the value in Lemma[lj we can replace the 0(m lg £l n) values representing Y(v p ) 
with Y'(v p ) in 0(m\g Sl n) time. However, in the alternative case, when |T(u p )| 
is small, we just directly insert and delete values into Yf. in 0(lg n/lglgn) time 
per operation. As with the string generation algorithm, the case where |T(u p )| 
is too small for batched updates only occurs in a constant number of bottom 
levels of T. Thus, the overall cost for updating Yi for every level in T in which 
a node is split takes 0(lg £l n x lg n/lglgn) amortized time. 

Finally, we consider the more common case where v does not split during 
the insertion, and how to update R(v). Consider the update path in R(v), and 
an arbitrary node u on this path. We can update the searchable partial sums 
structure in u in 0(1) time in the worst case by Lemma [3j If u is split, then the 
cost of rebuilding the searchable partial sums structure is absorbed by the cost 
of rebuilding u. The matrix structures can be rebuilt in O(l) amortized time 
per internal node on the update path by Lemma |4j or 0(lg n/lglgn) amortized 
time per ranking tree. Each conceptual leaf in R{v) takes 0(lg n/lglgn) time to 
update by Lemma [l] — since we must update the dynamic string — but there 
are at most two leaves updated per ranking tree. Overall we get that the cost of 
each update is 0((lgn/ lglgn) 2 ) amortized time. 

E Proof of Theorem [2] 

The data structure is roughly the same as the tree T from Theorem [l] except 
that now we need a few extra techniques to avoid paying for more than a constant 
number of pointers. The first idea is to use a generalized wavelet tree T' with 
fan out 0(lg £l n) over the universe [l..cr], instead of the original weight balanced 
B-tree T, in the same manner as Ferragina et al. [7]. The tree T' has height 
0(\ga/ lglgn), and, as in the tree T, we store dynamic strings Yg at each level 
in T' , as well as ranking trees for each node of T' . 

One issue is that the pointers to the ranking trees stored in each node of T" 
occupy too much space: the lowest level in T' in which we store ranking trees has 
0(n(lglgn) 2 /lg 2+£l n) nodes, and therefore 0(nw(\g lg n) 2 / lg 2+£l n) bits are re- 
quired for pointers to the roots of the ranking trees. However, since the maximum 
number of bits occupied by ranking trees at any level is O (n(lg lg n) 2 / lg 1_£l n) = 
o(n) — not counting the pointers to their roots — we can use the same technique 
as in Lemma [6] and group all the ranking trees at a given level in T' into a fixed 
memory area of size o(n). Thus, we can replace all of the 0(w) bit pointers to the 
roots of the ranking trees with O(lgn) bit pointers, and the space for pointers 
to the ranking trees becomes o(n) + 0{w\ga/ lglgn): one pointer to the fixed 



memory area per level in the tree. We also need 0{w lg a/ lg lg n) bits to store 
the pointers to the dynamic strings at each level. At this point, we can use the 
technique of Makinen and Navarro [TH] to further reduce this to 0(w) bits. 

Finally, since we are working solely with ranks rather than x-coordinates, we 
can discard the red-black tree S x . To analyze the total space cost we sum the 
space required by the dynamic strings and the ranking trees at each level in T. 
Thus, the total number of bits required is: 




By the same arguments presented in [7], this simplifies to: 




